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We consider the word collector problem, i.e. the expected number of calls to a random weighted generator before all 
the words of a given length in a language are generated. The originality of this instance of the non-uniform coupon 
collector lies in the, potentially large, multiplicity of the words/coupons of a given probability/composition. We obtain 
a general theorem that gives an asymptotic equivalent for the expected waiting time of a general version of the Coupon 
Collector. This theorem is especially well-suited for classes of coupons featuring high multiplicities. Its application 
to a given language essentially necessitates knowledge on the number of words of a given composition/probability. 
We illustrate the application of our theorem, in a step-by-step fashion, on four exemplary languages, whose analyses 
reveal a large diversity of asymptotic waiting times, generally expressible as k • m p ■ (log m) q ■ (log log m) r , for m 
the number of words, and p, q, r some positive real numbers. 

Keywords: Coupon Collector Problem; Waiting Time; Random Generation; Weighted Context-free Languages 



1 Introduction 

The choice of a suitable random model for the input instances of an algorithm is critical for its analysis. In 
an attempt to capture non-uniform distributions naturally arising in real-life data, Denise et al [5] studied 
weighted languages, a natural generalization of context-free languages [10] where atomic weights are 
associated to each letter. The weight of a word is then simply the product of its letters' own weight. This 
naturally induces a probability distribution over the class of words of a given length n, where the proba- 
bility of any given word is proportional to its weight. Aside from arguably being the simplest non-uniform 
generalization of combinatorial classes, such distributions naturally arise in statistical physics (Boltzmann 
partition function), with direct applications in algorithm design (Monte-Carlo Markov Chains) and bioin- 
formatics [13]. Random generation algorithms were also proposed for these distributions [5], leading to 
an efficient multidimensional generalization of Boltzmann sampling [3]. 

These distributions, and their associated random generation algorithms, can also be found in bioinfor- 
matics, where RNA folding has been one of the leading problems of the past three decades. Given an RNA 
sequence of length n, composed of four types of nucleotides (A, C, G or U), the goal is to predict the sec- 
ondary structure, a non-crossing subset of experimentally-determined base-pairs (hydrogen bonds). This 
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coarse-grain representation of the 3D conformation of RNA molecules has been extensively studied from 
a combinatorial perspective [19, 18]. A statistical sampling approach proposed by Ding and Lawrence [6] 
is one of the leading methods for tackling this problem. At the core of this method, one makes repeated 
calls to a random generation algorithm, which draws secondary structures with probability proportional 
to their Boltzmann factor. Unfortunately, such a redundancy is arguably uninformative when the proba- 
bility of each conformation can be exactly and efficiently estimated after each generation. One can thus 
interpret this redundancy as a degradation of the algorithm performance, and analyze the expected time- 
complexity of generating k distinct conformations. In the worst-case scenario, the targeted number k of 
secondary structures is the total number of secondary structures. Since energy-weighted secondary struc- 
tures are in bijection with weighted peakless-Motzkin words, then the worst-case/average-case (resp. on 
k and n the length) complexity of the algorithm is exactly the waiting-time of completing the class of 
weighted Motzkin words of length n. 

Generalizing on this question, the central problem addressed by this article is that of the Weighted 
Words Collector: Given a formal language and a word length n, how many calls to a weighted generation 
algorithm must be made before all the words of length n are obtained? This problem is clearly a weighted 
instance of the ubiquitous Coupon Collector problem which, given a finite collection C m of m items 
produced by a random source, studies the expected waiting time E[C m ] of the full collection C m , i.e. 
the expected number of generations before each item in C m is present in the generated set. This problem 
naturally arises in a large variety of contexts, including the analysis of database [2] and network [11] 
probabilistic algorithms. In the specific context of weighted languages, the two main specificities are the 
non-uniform nature of the weighted distribution and the potentially large multiplicity of coupons. 

In the uniform distribution, either probabilistic or combinatorial arguments can be used to establish that 
E[C m ] — m-H(m) <G 0(m log m), where H(m) = X)i>i 1/i is the m-th harmonic number. For general 
distributions, where the i-th object is generated with probability p i7 Flajolet, Gardy and Thimonier [8] gave 
a general expression for the waiting time of the full collection: 



However, specializing this formula for a given probability distribution seldom leads to spectacular simpli- 
fications, and the derivation of asymptotic estimates for parameterized families of items usually remains 
challenging. To overcome this limitation, many efforts have focused on providing closed-form approxima- 
tions [2], asymptotic equivalents [4, 14] and algorithms for computing the waiting time over non-uniform 
distributions of diverse degrees of generality. Weighted distributions over languages can be seen as highly 
specialized non-uniform coupon collections, whose major specificity is that many items may share the 
same probability or, in other words, some probability may appear with large multiplicity. Unfortunately, 
previous results either fail to apply to classes of coupons of high multiplicity, lead to bounds on the 
asymptotic behavior that are not tight [12], or require extensive a priori knowledge on the distribution, 
motivating further studies in the context of languages. 

Intuitively, the waiting time of a non-uniform instance of the Coupon Collector problem is dominated 
by the generation of a subset composed of the least probable items. Indeed, some subset of items can be so 
improbable that it is typically fully obtained only after all the other items in the collection are generated. 
In such cases, a lower bound on the waiting time can be obtained by isolating the subset and analyzing 
its waiting-time as a uniform coupon collector problem. However, deciding which subset to study can be 
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rather challenging, as the waiting time usually arises as a subtle tradeoff between the probability and the 
multiplicity. In the case of weighted languages, the presence of coupons having, simultaneously, large 
multiplicities and equally large discrepancy in their probabilities gives rises to a rich variety of asymptotic 
behaviors, and calls for a sophisticated - arguably technically involved - analysis. 

After a brief introduction, this extended abstract states, in Section 2, a general theorem for weighted 
families of coupons. More precisely, Theorem 2.1 relates the asymptotic behavior of a general Weighted 
Coupon Collector Problem to the multiplicity and weight of the i-th class of coupons. Section 3 compares 
the scope of the theorem with previous works addressing a similar problem. Section 4 develops a method- 
ology to ease the verification of the conditions of Theorem 2. 1 in the case of context-free languages, and 
applies it on illustrative examples. Finally, we conclude in Section 5 by summarizing the contribution and 
describing future developments. 

2 A general theorem for coupons of large multiplicities 

2. 1 Definitions and notations 

Given a sequence w = {u^}™ 1 of positive numbers, or weights, associated with a collection C' m of 
items, one defines a weighted probability distribution {pi} 7 ^ 1 over C m as; 



Pi = . \ , V i < m where u(m)='S^Wi. 

ii m ) ^ — ' 



1=1 



In this work, we are interested in distributions with high multiplicity, in the sense that multiple items may 
share the same weight/probability. Let us then introduce W TO = { W m) j}j the increasingly-ordered, finite, 
sequence of all distinct weights in w. Furthermore, for each i e [1, |W m |], let us denote by M m s the 
multiplicity of the weight W m s, i.e. the number of occurrences of W m ^ in w. We observe that: 

|W m | |W m | 

m = M ™>* and ^( m ) = M "m ' W m,i- 

i=l i=l 

2.2 Main result 

We describe a first-order asymptotical expression for the expected time of the full collection, assuming a 
large number m of items. Accessible weights W m and their multiplicities M m may in principle vary for 
different values of m, leading, in the extreme case, to the absence of a limit expression for the waiting 
time. Therefore we restrict the scope of our main theorem to distributions that obey three, essentially 
technical, conditions. 

Hi - The number m of coupons and the weight rank i may interact only in a simple way within the 
multiplicity of a given weight. Thus we require that: 

- There exists functions fi, ■ ■ ■ , f p , gi, ■ ■ ■ , g p , h and H , such that 

V P 

E/jWfjM E fi (i)9j («*) 

M m ^ - - — — , and M m ^< E3 , Vm > 1, Vi < |W m |. 

m^cx) h(l) H(l) 
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- The functions /i and gi must effectively determine the growth of M m , i7 therefore one requires 
that: /i is positive and non-zero everywhere, gj(m) = o(gi(m)), Vj € [2,p], and gi(m) — >• 
+00. 

- Finally, Trnx must converge, to prevent H from capturing the growth of M m ». 

»e[i,|w ro |] 

H2 - Similarly, we restrict the possible interactions of the weight rank i and the number m of items within 
the i-th weight W m ^i, by requiring the existence of functions v(i) > and u>(m) > such that 

W mt i > v(i) ■ Lj(m), Vm > l,Vi > 1, 

and such that any weight at rank i, beyond some value of m, remains constant: 

Vfc > 0, 3mfe > such that W m .i = v(i) ■ to(m), Vm > m/j, Vi < k. 

H3 - The multiplicity M m .i must not grow too quickly in comparison with the weight W m ,i- More 
precisely, if I W m I — > 00, then one must have 

m— >oo 

W) 



The conditions are sufficient (yet not always necessary) to obtain the asymptotic behavior of the waiting 
time, and hold for a large class of weighted languages. 

Theorem 2.1 Assume that, for all m > 0, the weights W m and multiplicities M m of the coupon collection 
satisfy the conditions H-i, H2, and H3. Then, as m — > 00, one was 

£[C m ] - r (F, v) ■ G(m) ■ • (1 + o(l)), (2.1) 

where: 

• ju(m) is fne fofai weight of all coupons; 

• F = fi and G = g\, defined in H-i, drive the leading term of the growth of M m j as m — > 00; 

• uj(m) ■ v(l) is the smallest weight within the collection of cardinality m (see H2); 

• t* (F, v) is the largest value oft such that there exists x £ N such that F(x) — t ■ v(x) > 0. 
Sketch of proof. 

We give here a brief description on our proof, whose details can be found in Appendix A. 
Applying a substitution u > t to Equation 1.1 gives 



11 p f 00 
E[C m } = -T^T, 9j(m) * m (t)dt where tt T 
u)(m) Jo 
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Fig. 1: Plots of the 1 3/ m (i) functions as appearing for a uniform (Left) and weighted (Right, n(a)/-K(b) = 2/3) 
distribution over the (a + b)* language. We consider m = 2 k coupons/words, for several values of k £ 
{3, 6,9, 12, 15, 18, 21}. The convergence of \t m (i) towards a step function when m — > oo featuring a transition 
at t* (t* = 1 in the uniform distribution and t* = 8/9 in the weighted one) is crucial to our approach. 

Focusing on this expression, one shows that the integral of ^ m (t) converges towards some constant. 
Indeed, numerical computations, as illustrated by Figure 2.2, suggest that ^> m converges toward a step 
function when m — > oo. This can be rigorously proved under the conditions H-i, H2 and H3, and the 
integral from to t* (/1 , v) converges to t* (/1 , v), while the remaining integral converges to 0. □ 

Remark 1 Theorem 2.1 applies in the special case of the uniform distribution. Indeed, considering the 
weight collection is {w^}™ 1 = {1}™ x , one has pi = 1/m and fi m = m. The set of weights is then 
reduced to the singleton W m = (1), which has multiplicity M m ,i = m = e logm . 

• Hi is satisfied upon taking 

F(«):=/i(i) = l, G(m) := . 9l (m) = log m, h(l) = 1 and H(l) = l, 
noticing that X)i<;<|w | V-^W = V-^(l)> which obviously converges. 

• Since one has W mj i = 1, then H2 is satisfied with v(\) = 1 and uj(m) = 1. 

• Since W m is finite, the limit condition of H3 does not need to be verified. 

One then easily verifies that t*(/i, v) — t* (1 , 1) = 1, and applying Theorem 2.1 unsurprisingly gives 
E[C m ] ~ m log m, which is the well-known asymptotics of the uniform coupon collector. 

3 Comparison with existing results 

Let us compare the scope of our result with previous work on the subject; we remind the reader that m 
is the number of coupons/words, which typically grows exponentially along with n the length of words. 
Given the rich literature dealing with variations on the Coupon Collector problem (e.g. waiting time of 
first occurrence of a ^-duplicated collection [1]), we will restrict our comparison to three results that are 
representative of the main approaches used to tackle the problem. 
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Berenbrink and Sauerwald [2]: £>(loglog(m)) and C(log log log(m)) approxima- 
tions for general distributions 

Building on previous results [16], Berenbrink and Sauerwald [2] consider the two approximations 

m log log m 

U 2 :=Y— and Ua := V 

f^iPi i^i 1 e Pl 

where gi is the number of coupons c such that e 4_1 < p c /pi < e 1 , and {ji\ l °^l° Em is a sequence 
of indices such that {-^j^'Hg H y°^ ogm is decreasing. They show that Ui and Ua approximate E[C m ] 
within C(log log(m)) and 0(log log log(m)) ratios respectively. More precisely, they show that 

< E[C m ] <2-U 2 and - — U \ < E[C m ] < 35 • Ua- 



3e log log to ~ log log log m 

Furthermore, U 2 can be computed in polynomial time (on n), since there exists at most to s I composi- 
tions/weights. However, the exponential growth of m on n limits the final precision of the approximation 
ratio to O(logn). Finally, an efficient evaluation of Ua would yield a O (log log log to) approximation 
in time 0(logn). Unfortunately, figuring out a suitable sequence {ji}'°f 1 logm remains challenging, and 
seems to require knowledge over the multiplicity of coupons comparable to the one required for the ap- 
plication of Theorem 2. 1 . 

Boneh and Papanicolaou [4]: Asymptotic estimates for truncated sequences of 
weighted coupons 

The authors derive general results for the asymptotics of the coupon collector problem under fairly general 
distributions of coupons. They consider a fixed sequence of strictly positive weights a = {ak}^ =1 , and 
study the truncation a m of a. to its first m terms. 

Their first result requires the existence of a £ e]0, 1] such that S := X^fcli £ afe < 00 ■ However, under 
the hypotheses of our main Theorem 2.1, there always exists a weight of unbounded multiplicity as m 
goes to the infinity, and S therefore diverges for any value of £. 

Their second result is based on the assumption of a decreasing sequence a. However, many weighted 
distributions that satisfy hypothesis Hi to H3 cannot be defined by truncating a fixed decreasing sequence. 
For instance, suppose that for all m, the accessible weights are {^j— -}£Li U {2}, each appearing with 
multiplicity to. It is easily checked that such a set of weights cannot be ordered into a decreasing sequence 
whose truncations include the families of coupons weights. 

Conversely, distributions with low multiplicity satisfying their conditions are not covered by our Theo- 
rem 2. 1 . Therefore, their results and ours are complementary, and seldom overlapping. 

Neal [ 14]: The limiting distribution 

Neal studied the distribution of the waiting time. Although the results described in the article can in 
principle be used to assess the expectation of the waiting time, checking the prerequisites of its main 
theorem turns out to be considerably more involved than checking those of Theorem 2. 1 . In particular, one 
has to figure out suitable sequences, respectively related to the expectation and variance of the distribution, 
from which the limiting distribution follows. This result is therefore mostly suitable to prove a conjectured 
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distribution from a limited list of its moments. Conversely, knowledge of the expectation, as obtained from 
our contribution, can help figuring out suitable sequences to apply their results to. 

4 Applications to languages: the word collector 

4. 1 Weighted Languages 

Let us remind some definitions introduced by Denise et al [5]. Let £ be a language defined on an alphabet 
S, and let £„ be its restriction to words of size n. A positive weight ir t is assigned to each letter t of E. 
One extends these weights multiplicatively on any word u e C such that the weight of a word to is 

tt(w) = 11^* 
tew 

This naturally defines a weighted probability distribution on C n , given by 

mr , tt(uj) 

M = £ ' 

With these definitions, C n is an example of a coupon collection where each coupon is a word of C n . The 
number m of coupons is the number of words of C n . As m is now function of a n, all the characteristics 
of the weight distribution, such as W m , will be indexed by n instead of m. 

4.2 Verifying preconditions Hi , H 3 and H 3 in the context of weighted languages 

Let us outline a systematic method to verify the preconditions Hi, H2 and H3 for a language C defined 
over an alphabet £ = (<n, . . . , a*;). The idea is, firstly, to classify the words of the language according 
to their weights and find the number of words having a given weight (Step 1). Then one has to find 
an ordering of the different weights (Step 2). If the order cannot be found explicitly, one has to find a 
sufficient approximation of it (Step 3). Once this is done, the hypotheses of Theorem 2.1 are usually 
easily verified. 

• Step 1: Characterize the set of distinct weights. 

The weight of a word is directly related with its composition (or sub-composition). 

Definition 4.1 (Compositions and sub-compositions) The composition of a word is the vector of 
occurrences of each letter within the word. More precisely, if a word oj has X\ occurrence of the let- 
ter a\, . . . , Xk times the letter a^, irrespectively of their order, then its composition is (x\, . . . , Xk). 
Suppose that 1 = n ai = ■ ■ ■ = ir ai for some I, then the sub-composition of a word of composition 
(xi , . . . , Xfe) is the vector (x;+i , . . . , Xk), in a (k — I)- dimensional space, sometimes denoted x. 

Let us denote by M(x) the number of words of C n having a given sub-composition x. By definition, 
any words having the same sub-composition share the same weight. The reverse is not true in 
general, and words having different sub-compositions can have the same weight. 

Notation 4.2 T n C N fe ~ z is the set of all distinct sub-compositions appearing in C n . 
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Step 2 : Find a suitable ordering of weights. 

Firstly, let us define an ordering function over £ n , which will greatly help us characterize W„. 

Definition 4.3 (Ordering function) Let 4> n be the application that assigns, to each sub-composition 
q/T„, the position of its weight in W n . One has 



b ■{ Tn (4 1) 



In general, this function is not bijective, therefore let us define the generalized inversed ordering 
function <p n as follows : 

i j \W n \ > T n 

Vn ' \ % x, ifW n ,i = n{x)and\x\ = mm{\(x')\,W nii = ir(x')), K ' 

where \x\ = xi +1 + ■■■ + Xk ifx is the sub-composition , . . . , Xk). 
With these definitions, W n ,i and M n ^ can be written in terms of cf) n and </> n as 

W n ,i = 7rr>„(i)) and M n . t = ^ ^ M(x). (4.3) 

x££„, x\-\ hxi=n— |x| 

4> n (x)=i 

Sub-compositions are vectors in a (fc — Z)-dimensional space. It is easily checked that the weight of 
any sub-composition, found underneath the (fc— I — l)-plane H (x) of equation X^=;+i x j 1°S ^o. = 
0, is smaller than 7r(x), and that any sub-composition above has larger weight. 

Definition 4.4 Let K n (x) c Y n be the set of sub-compositions below H(x) (all the sub-compositions 
that belong to H(x) have the same weight), and S n (x) be the number of sub -compositions that be- 
long to H(x). 

Then one has the following expression for (f> n : 

x'eA„(x) 

Indeed, <j> n counts the number of sub-compositions, with distinct weights, under H(x). If each 
weight matches a unique sub-composition, then S n (x) = 1 for all x, and <fi n (x) = |A„(x)|. 

• Step 3 : Approximate the ordering functions <p n and <fr n . 

Condition H3 directly follows from steps 1 and 2. However, conditions Hi and H2 require good 
approximations of | A„| and S n . Such approximations strongly depend on the language C of interest, 
therefore we present several examples to illustrate the method. 

4.3 Application to specific languages 

In this part, we shall denote by D n the collection of all words of length n, and assume that pairs of non- 
unit weights are incommensurable, which implies that sub-compositions can be bijectively associated with 
weights. 
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4.3. 1 The unconstrained language S* 

Let us consider the language C = £*, where S = (a\, . . . , ak). It is worth noticing that the weighted 
distribution is stable upon multiplying each weight by a constant factor, therefore we assume without loss 
of generality that 1 = 7r ai = ■ ■ ■ = n ai for some I > 1, and 1 < 7r a;+1 < • • • < 7r a(e . 

Under these assumptions, one has T n = {(x', |x'| < n)}. The function </>„(x) counts the num- 
ber of sub-compositions under H (x) which belong to T n . Notice that, for sufficiently large values of 
n, any sub-composition x' belongs to r„. It follows that there exists a function <fi such that, for all 
sub-composition x and for n sufficiently large, one has </>„(x) = </>(x). From Equation (4.3), one has 

W n j = 7Toi+^^ • • • 7r a fc "' fc ~ ! ^> an d it follows that, for sufficiently large values of n, one has W n j = 

7Toi+i^ ' ' ' ^tk~'^ ■ Consequently, Condition H2 is verified with 

K*)=<^--4r W and W (n) = l. 

In £„, the number of words of composition (x\, ■ ■ ■ ,Xk) is M(x\, ■ ■ ■ ,Xk) = ( Xi ), thus the 
number of words of sub-composition (x;+i, •• • , Xk) is M(xi+\, ■ ■ ■ ,Xk) = l n ~ x w Xk ( n ). 
Since there exists only one sub-composition x such that 0(x) = i, then it follows from Equation (4.3) 
that M n s = l n ~^" l - i)l (j > n ( i) ), where (™) is the multinomial coefficient ( Qi ™ Q J. Since <j> n = <p for 
sufficiently large values of n, one has 

M ni ~ r-^Wlf." ] ~ ^ . (4.5) 

Let us now give some properties of the functions </>„ and </>. 
Lemma 4.5 Let S := Ylj=i+i 1°S ^aj' -P : ~ ITj=z+i 1°S an <^ introduce a notation 

\x\ n = x i+1 log7r ai+1 H hx fc log7r afe . 

Then the following inequalities hold: 
i) For any sub-composition x, 

<, w <%±|t!. (4 . 6) 



(k-l)\P ~ ^ w " (k-l)\P ' 



//) For a// i > 0, one /zai 



"</i(fc-0^-5< < k -y/i(k-l)\P (4.7) 



'i(k-i)\ n _? , k _, -tJ—< 10(01 < "-HHk-iy- 



'(l0g7T a J fc ! l0g7T afc Y '(l0g7T ai+1 ) fc r 

iiij For x anc/ n > 0, one nas 

< <K*)- (4.8) 
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ivj For alln> and i > 1, one has 



log7Tg i+ i 
1°S 7Ta fc 



10(01 < I0JOI < 10(01- 



(4.9) 



Proof. 



i) Remind that </>(x) counts the number of points which are under the (k — l — l)-plane H(x). Equation 
(4.6) just consists in bounding <p by the volume of the (k — I — l)-pyramid under H(xi + i, . . . , xt) 
and the {k — l— l)-pyramid under H(xi +1 + 1, . . . , Xk + 1). 

ii) The first equation is obtained from equation (4.6), taking x = 4>{i). For the second equation, one 
uses the fact that |x| • log7r ai+1 < |x| T < |x| • log7r afc . 

iii) The function <j) n (x) counts the number of sub-compositions which are both under H(x) and belong 
to T„, whereas <j)(x) counts the number of sub-compositions which are under H (x). 

iv) For a given length n > 0, any sub-composition is found below the |x| = n hyperplane and, in 
particular, one has |0„(OI < n. For some sufficiently large value of n' > n , the sub-composition 
of i-th weight becomes fixed and is necessarily a sub-composition of r„' that did not belong to T n . 
Consequently, this sub-composition is above the |x| = n hyperplane, one has \<fi(i)\ > n and one 
finally gets \4> n (i)\ < |0(i)|, Vn > 0,Vi > 1. 

On the other hand, the sub-composition 0(0 must be below the hyperplane |x| = |0„(i)| 7r other- 
wise its weight would be larger than the one of n (O- This gives |0„(O|7r > 1 0(*) I tt- Since any 



sub-composition obeys 



> x > 



log IT a 



one has 



l0n(OI > > !^ > 10(01, 



loe 



log 



log 7T a 



which concludes the proof. 



□ 



Combining Equations (4.5) and (4.8), one obtains bounds for the leading term of M„ ;i , for all i and as 
n — > oo, such that 



n 

0«(O 



I0JOH - 



log 7 



-\4>{i)\ 



,I*WI 



/ log 7T a ~ \ 

The convergence 

of E V ( io g ^l +1 10(01 ) ! follows from Equation (4.7). Therefore, Hi is satisfied for 
the following choice of functions 





m :=/i(i) 


/ 2 (0 


G(0 := 9l (n) 


02 (n) 


Hi) 


#(0 


I = 1 


10(01 




logn 




10(01' 


(z|0(O|)! 


I > 1 


log/ 


10(01 


n 


logn 


^WI|0(O|! 


z(^Wl)(z|0(Ol)! 
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where z = log7r ai+1 / log7r afe . Furthermore it can be verified that H3 is satisfied, since Equation (4.7) 
gives a lower bound for v(i)/F{i). Consequently, Theorem 2.1 applies to the weighted distribution on 
S*, and we get. 

Proposition 4.6 The expected waiting time E[D n ] for obtaining all words of length n in C = S* admits 
the following asymptotic behavior: 



E[D r , 



K\ ■ fi(n) ■ logn if I = 1, 
k 2 ■ M( n ) ' 71 otherwise, 



( k \ n 

where I is the number of letters of lowest weight, p,{ri) = ( I + J2j=i+i ^a, J is the total weight, K\ — 
t* A), and k 2 = t* (log/, A) with A = nt, 1 ® ■ ■ • 7r£*- l(0 . 

Corollary 4.7 Define p = log(7T ai + • • • + 7r afc ) / log k, noting that p > 1 and p = 1 only in the uniform 
case. The expected waiting time E[C m ] for obtaining the m = k n words of length n in C = X* is 
asymptotically equivalent to 

• Ki ■ m p ■ log log m, if there is a single lettre of smallest weight; 

• (kiI log k) ■ mP ■ log m, if there are at least two letters of smallest weight. 

4.3.2 Motzkin words 

Motzkin words are well-parenthesized expressions featuring any number of dot characters •. This lan- 
guage, denoted by C^ m \ is generated by the context-free grammar 

S->(S)S\ • S I e. 

Here we study the expected waiting time to generate all Motzkin words of even length n. For the sake 
of readability, we replace the characters (, ) and • by letters a, a and b respectively. Since parentheses 
come in pairs, any word has equal number of occurrences of a and a, and the parity of the number of 
occurrences of b is the parity of the word length. Consequently, accessible compositions for words of 
length n are triplets (x a , x a ,Xb) of the form (k, k,n— 2k), with < k < n/2. The number of words of 
size n is then given by 

"<*•*■"- »>-STl(?)(£, 

The expected waiting time shows two types of behavior depending on whether a or a have the smallest 
weight. To give a flavor of our result and illustrate its proof strategy, we explicitly derive two exemplary 
results for the cases where 1 = 7Tb < 7r a < 7r a and 1 = 7r a = 7r a < 7Tb, and give the general form of the 
asymptotic equivalent for the weighted Coupon Collector. 

First case: (1 = 7r& < 7r a < 7r a ). Here, the sub-compositions (x a ,x a ) are of the form (k, k), < k < 
n/2, and the associated weights are of the form tt^tt^, increasing with k. Therefore one has W n ,i = 
7r a _1 7r a _1 , and H2 is satisfied with v(i) = 7r a _1 7r a _1 and w(n) = 1. The number of words having weight 
W nt i, or equivalently of sub-composition 1), is given by 

\(2i-2\( n \ n 21 - 2 (2i 



)\2i-2) n^oo i(2i — 2)! V i — 1 y l)! 2 
Moreover, for all i < n/2, one has M n ,j < Ci-i)' an ^ ^ 1 ^ s sat i sne d with 
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F(i) :=/i(i) 


G(n) := ffi(n) 






2i- 2 


logn 


i 

i(i-l)! a 


1 



coupled with the observation that £ f — l)! 2 converges. The verification of H3 is immediate, and 
applying Theorem 2.1 readily gives the following result. 

Proposition 4.8 The expected waiting time of the full collection of weighted Motzkin words of even length 
n, under the configuration 1 = 7T(, < 7r a < 7r a , admits the following asymptotic behavior: 

E[D n ] — k ■ p(n) ■ logn 
where « = t* (2i - ^Tr*" 1 ^ 1 ) W//(n) = £^ ^ (^) ( 2 >M- 

Second case: (1 = 7r a = 7r a < 7T&). In this second case, the sub-compositions (xb) are of the form 
(n — 2k), for < k < n/2, and the weight of a word increases with the number of occurrences of 



b. Consequently, one has W n 



_2(i-l) 



, and H2 is satisfied with v(i) 



7T 



2(i-l) 



and uj(n) 



1. 



Furthermore, if (n — 2k) is the sub-composition of the i-th weight, then n — 2k = 2(i — 1), leading to 
k = n/2 — (i — 1) and one finally has 



M„. A = 



n-2(i- 1) 



n 



2 n - 



n 



2(i-l)-§ 



Finally, one has M n>i < 2" 



f - (t - 1) + 1 V I - (* - 1) J \n ~ 2(i - l)J ™ V^2 2 < 4 - 1 )-i(2(z - 1))! ' 



^2 



2(i-l)- 



2(2(i-l))! 



, for i < n/2, and Hi is satisfied with 



hii) 


/ 2 « 


ffi(n) 


92(n) 




H(i) 


log 2 


2(* - 1) - § 


n 


logn 


x /^2 2 ( i - 1 )-i(2(i-l))! 


x /^2 2 ( i - 1 )-i(2(i-l))! 



since £ l/2 2j (2(i — 1))! converges. Again, verifying H3 is immediate. 

i 

Proposition 4.9 The expected waiting time of the full collection of weighted Motzkin words of even length 
n, under the configuration 1 = n a = 7r a < nt, admits the following asymptotic behavior: 

E[D n ] ~ k ■ p(n) ■ n 

n/2 



-2k 



where n = t* (log 2, ^) and n(n) = £ ^ ( 2 *) ( 2 "J< 

This approach can be extended to any relative positioning of ir a , 7r a and n^- The symmetrical roles 
played by the letters a and a, allow for a restriction, without loss of generality, to cases where n a < ir a - 
Also, singularity analysis can be applied to the generating function of the weighted Motzkin language, 
giving p, n ~ k ■ p~ n ■ n" 3 / 2 , with p = (n b + 2^/7r a "7r a )" 1 . 

Proposition 4.10 The expected waiting time for generating all Motzkin words of length n obeys: 



E[D„] 



s/n 



With p — , ^5 if TT? > 7T a 7T a , 



K /£ —^\oen with p' 

n\Jn ° ' 



Tr b + 2^/TT a TTa 



where K and k' are constants ofn which can be explicitly computed ( and depends on the relative positions 
of the weights). 
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Fig. 2: Secondary structure of a 5s ribosomal RNA. A well-parenthesized expression (lower-left) unambiguously 
defines a set of matching position (upper-left) which folds into a projection of a three-dimensional conformation of 
the molecule. The latter representation illustrates the relationship between the > 6 steric constraint and the absence 
of sharp turns. 

Corollary 4.11 Let m be the number of Motzkin words of length n (m ~ 3(^/3/2^/n)3 n n' 3 / 2 ). The 
expected waiting time for generating the complete collection of m words obeys 

f K-mV.\o g {m)^ wi thp= l °^ +2 ^- 10 ^ ifn 2 b >n a n- a , 

nl ~ \ k' ■ mP' ■ logM 1 ^ • loglogm with p> = ^±1^)^11 if < M? 

for constants k and k' that can be explicitly computed ( and depend on the relative positions of the weights ). 

4.3.3 RNA secondary structures 

Through an adaptation of Viennot et al [17], secondary structures can be generated by a grammar: 

S^(S> e )S\ • S\e and S> e -> ( S> e ) S | • S> 9 \ 

where 9 is the minimal distance between matching parenthesis, enforcing steric constraints. The con- 
nection between the secondary structure and the conformations of an RNA sequence is illustrated by 
Figure 2: Matching parentheses represent base-pairs, or interacting pairs of nucleotides mediated by 
hydrogen bonds. Such base-pairs are known to stabilize a secondary structure, thus decreasing its free- 
energy. In this model, we consider a simple free-energy model proposed by Nussinov[15] which assigns 
a —1 kcal/mol contribution to each base-pair. The free-energy E(S) of a secondary structure S is then 
inherited additively by summing the individual contributions of its base-pairs. 

One can assume a Boltzmann distribution on the set of secondary structures, where the probability of 
any secondary structure S of length n is proportional to its Boltzmann factor e~ E ( S " RT , with R the gas 
constant and T the temperature. Such a non-deterministic perspective over the RNA folding process is 
fundamental to a recent paradigm shift in RNA structure prediction [6] based on random generation. In the 
worst-case scenario, the complexity of this algorithm is equivalent to a coupon collector for Boltzmann 
weighted secondary structures. It is then worth noticing that the Boltzmann distribution is just a special 
case of a weighted distribution, where a neutral weight 1 is assigned to unpaired positions, and a weight 
e i/RT to eac jj p a j r f matching parentheses. 
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Again in this example, we replace the characters (, ) and • by letters a, a and b respectively. Let us 
denote by £( rna ) the language of RNA secondary structure. For the sake of simplicity, let us assume, 
without loss of generality, that I = irt, < n a < iTa, with n a ■ n a = e 1 ' RT . The compositions are triplets 
(x a , x a , Xb) of the form (k, k,n — 2k), < k < n/2. The number of words of size n having p plateaux 
and k occurrences of a is given by 1 if (p, k) = (0,0), and s n ^, P ,e = i (p) (p-i) (™2fc P ) otherwise. 
Consequently, the number of words having a given composition (k, k,n — 2k) is such that 



M(k,k,n-2k) = 5 kfi 
where <5 is the Kronecker symbol (8 a ,b 



E 

P =i 



s n ,k, P ,e — <5fc,o 



y - 

P =i 



k 

k \p 



k 

- 1 



n — dp 
2k 



1 if a = b, and otherwise). Since 1 = 7Tb < 7r a < 7r a , the 



weights of words are increasing with the number of a. It follows that W n ^ — Wa *\ an d H2 is 

satisfied with v(i) = Wa ^ and w(n) = 1. 

Moreover, the multiplicity of the weight W n s is the number of words having sub-composition (x a , Xg) 
of the form (i — 1, i — 1), and is given by 



Mr, 



Si-1,0 + 



P =i 



1 fi-1 
(i-l)V P 



n — dp 

2{i -I) J ri- 



ll 



2(i-l) 



l)! 2 ' 



Indeed, for large values of n, the scope of the sum above can be limited to p € [1, i — 1] since any term 
such that p > (i — 1) has null contribution. 



One also has M n ^ < 2jjj 



jot, for all i, thus Hi is satisfied with 











2(i-l) 


logn 


i(i-l)! 2 





where ^ 1/H(i) obviously converges, and the verification of H3 is immediate. Setting 



LfJ 

E 

fc=0 



^0+ E r 



P =i 



p-i 



n — Op 
2k 



{^a^a) k , 



one verifies, e.g. from the strong-connectivity of the grammar [7], that 



1 



-3/2 



where c is a constant, and p 8 is the dominant singularity of J2n>o M n ) ' z ™- 

Proposition 4.12 77ze expected waiting time for the collection of Boltzmann-factor weighted RNA sec- 
ondary structures of length n, assuming 1 = u b < TT a < ir a , admits the following asymptotic behavior: 



E[D n ] ~ K 



Pe 

tiy/n 



bgn, ye e N+ 
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where n = t* (2(i — 1), (7r a 7r a ) 4 x ) and, setting q = 7r a 7r a , pg is the smallest positive real solution of 
1 - Az + (6 - 2g)z 2 + 4(g - l)z 3 + (1 - 2g)z 4 - 2 g z 9+2 + Aqz 6+3 - 2q{\ + q)z e+i + g 2 z 20+4 = 0. 
Corollary 4.13 Define i]g as the smallest positive solution of the equation 

1 - Az + 4z 2 - z 4 - 2z e + 2 + Az e + 3 - Az e+A + z w + 4 = 0. 

Then the number m of RNA structures of length n is asymptotically equal to X • T]g • n~ 3 / 2 , and the 
asymptotic waiting time of the full collection is given by 

E[C m ] ~K-m p ■ (logm) 3 ^ 2 • log log to, 

where p = — 1°| ^ , and A and K are constants that can be fully specified. 

4.3.4 A non strongly-connected language 

Let us finally consider the language £(™ c ) over an alphabet {a, a, b} and generated by the grammar 

S->aSbU\e and U^aUbU\e. 

It is worth noticing that this grammar is not strongly connected, and the distributions of letters may 
therefore be untypical (non-normal and/or expectation/variance not in 0(n) [7]). Here, this grammar 
models binary trees, whose leftward edges along the leftmost branch are marked by a dedicated letter a, 
and any other leftward (resp. rightward) edge is marked by a (resp. b). 

The restriction of £( nc ) to words of odd length is empty, thus we only study the word collector on 
even sizes. The structure of this grammar is such that each word of size n has exactly n/2 occurrences 
of the letter b, and compositions are therefore triplets (x a , x a , %b) of the form (n/2 — k,k,n/2), for 
1 < k < n/2. An elementary computation shows that the number of words of a given composition is 



M(n/2-k,k,n/2) = 



n — fc— 1\ /n — fc— 1 
n/2 - 1 J ~ \ n/2 



The expected waiting time depends on the relative position of the weights associated with letters, leading 
to different behaviors. Let us illustrate the approach on one out of the 9 possible configurations, such that 

1 = TTb < TT a < 7T a . 

In this case, the sub-compositions are pairs (x a ,x a ) of the form (n/2 — fc, fc), for 1 < fc < n/2. 

_ 7l/2 i 

Moreover, since the weight of the word increases with the number of a, then one has W n ,i = 7r a 7r a , 
and H2 is therefore satisfied with v(i) = (ira/^aY and cj(n) = 7r™^ 2 . 

Remark 2 The influence of the configuration (ordering of the weights) only appears in the definition of 
the functions v and co. The function lo may become constant (equal to 1) when either iri, = ir a = 1 or 

7Tb = TTa = 1. 

Now the number of words having the i-th weight, i.e. the sub-composition (i, n/2 — i), is given by 

^=( n -;- 1 )-( n -); 1 ) - 2-n-iJI. (4.10) 
\ n/2 — I J V n /2 / V 7r 

Since M nA < 2 n - 1 + 1 n-h v / 2/^foia\\l < i < n/2and^]i/2 4 converges, the condition H-i is satisfied 

i 

for the following functions: 
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F(i) :=/i(i) 


Hi) 


G{n) := fifi(n) 


52 (n) 






log 2 


3 
2 


n 


logn 


TV 2 


XVI 



The verification of H3 is immediate. 

™/2 r -I „_ fe 

From (4.10), one has fj,(n) = J2 ("/2-1) _ ("n/^ 1 ) ^ tt*, whose asymptotic behaviour obeys 



/x(n) - 2\/2 



(27T a - 7r a ) ; 



(2^)"n-V2. 



Proposition 4.14 77ze expected waiting time for obtaining all words in £(" c ) of even length n, under the 
configuration 1 = iri, < ir a < ir a , admits the following asymptotic behavior: 



E[D n ] ~/s~, 



w/zere k = t* (log 2, (7r a /7r a ) 4 



2 y/27T a 7Ta 

(27r a - 7 r a )2- 



Corollary 4.15 Lef m be the number of words of even length n in L^ nc \ asymptotically equivalent to 
2^/2pn ■ 2™ • n~ 3 / 2 . The expected waiting time of the full collection is 

E[C m ] ~ K- TO- (log TO) 5 / 2 , 

where k is a constant that can be explicitly computed. 

Again, these results can be extended to any relative ordering of 7r a , 7r a and 7Tb, and one obtains the 
following result. 

Proposition 4.16 The expected waiting time for all words of even length n in £(™ c ) is equivalent to 

{K ■ -C= ifn a = 1, Or 1 = 7T 6 < 7T a < 7T a , 

. 2" • i^S otherwise, 

where n and k' are constants that can be explicitly computed. 

Corollary 4.17 Let m be the number of words of even length to in £( nc \ Then the expected waiting time 
of the complete collection is asymptotically equal to 

E y c j f K ■ TO 2 • (l0gm) 5/2 iflT a = 1 Or 7T b = 1 < 7T a < 7T a , 

I k' • to 2+9 • (logm) 2+9//2 • loglogrn otherwise, with q = log 2 (7r a /7Ta) 
w/zere k ana? k' are constants that can be explicitly computed. 
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5 Conclusion 

In this extended abstract, we studied a language generalization of the ubiquitous Coupon Collector Prob- 
lem. Focusing on collections of weighted coupons having large multiplicities, we contributed a new 
theorem that relates the asymptotic waiting time of the full to the growth of the multiplicity of coupons 
of a given weight. We compared the novelty of the contribution against pre-existing work on the subject. 
We discussed the application of our theorem to weighted languages in general, and particularly on four 
languages showing different properties (rational vs context-free, simple-type vs non-square-root singular- 
ities, limited vs parameterized alphabet. . . ). 

Quite interestingly, our study of four illustrative examples reveals a large variety of expressions for the 
waiting-time. As a function of the word length n, we observed waiting times of the form k ■ fi{n) ■ n and 
k ■ [i{n) ■ log(n), depending essentially on the multiplicity of the smallest weights. As a function of the 
number of coupons to, we obtained estimates of the general form k ■ m p ■ (log to) 9 • (log log to) , where 
p and q are irrational numbers and 6 e {0, 1}. Such a diversity partly not only arises from differences 
regarding the nature of the asymptotical growth within the language, but also reflects subtle differences 
in the accumulation of the contributions of the least probable words. To our opinion, this illustrates the 
versatility of the method, and hints toward a significant amount of work being required, in the case of 
approximations [2]. 

Perhaps the main limitation of our work lies in the prerequisites of Theorem 2.1. As shown in Section 4, 
verifying these - technically involved - conditions is already made easier in the context of languages. 
However, one could imagine characterizing broad classes of languages that automatically verify these 
conditions. For instance, conditions of aperiodicity (a.k.a. lattice-type [9]) and strong-connectivity of 
a context-free grammar are known to ensure typical asymptotic growths, both for the total number of 
words, their cumulated weight and the total number of words of a given composition [7]. We hope that 
such conditions, possibly in addition to other easily-checkable properties, could provide a sufficient set of 
conditions for a given regime. 

Another natural extension may generalize the results to multi-parameterized combinatorial classes, as 
generated by decomposable combinatorial classes [10]. The main difficulties behind such an extension are 
related to the variety of asymptotic growths that may appear, e.g. for the substitution construct, in addition 
to an increased level of difficulty for determining the number of words of a given composition/weight. This 
both motivates a further relaxation of the - sufficient but not necessary - conditions of Theorem 2.1, along 
with a study of accessible asymptotics for the growth of coefficients in multivariate generating functions. 

Acknowledgements 

The authors wish to thank an anonymous reviewer for suggesting a more intuitive presentation of our main 
result. This work was supported by the French Agence Nationale de la Recherche through the BOOLE 
ANR 9 BLAN 1 1 (JDB and DG) and MAGNUM ANR 2010 BLAN 02 04 (YP) grants. 

References 

[1] I. Adler, S. Oren, and S. Ross, The coupon collector's problem revisited, Journal of Applied Proba- 
bility 40 (2003), no. 2, 5 1 3-5 1 8 . 

[2] P. Berenbrink and T. Sauerwald, The weighted coupon collector's problem and applications, 15th 
International Computing and Combinatorics Conference (COCOON' 10), 2009. 



18 Jeremie du Boisberranger and Daniele Gardy and Yann Ponty 

[3] O. Bodini and Y. Ponty, Multi-dimensional Boltzmann sampling of languages, Proceedings of 
AOFA' 10 (Vienna), DMTCS Proceedings, June 2010, pp. 49-64. 

[4] Shahar Boneh and Vassilis G. Papanicolaou, General asymptotic estimates for the coupon collector 
problem, J. Comput. Appl. Math. 67 (1996), no. 2, 277-289. 

[5] A. Denise, Y. Ponty, and M. Termier, Controlled non-uniform random generation of decomposable 
structures, Theoretical Computer Science 411 (2010), no. 40-42, 3527 - 3552. 

[6] Y. Ding and E. Lawrence, A statistical sampling algorithm for RNA secondary structure prediction, 
Nucleic Acids Research 31 (2003), no. 24, 7280-7301. 

[7] M. Drmota, Systems of functional equations, Random Struct. Alg. 10 (1997), 103-124. 

[8] P. Flajolet, D. Gardy, and L. Thimonier, Birthday paradox, coupon collectors, caching algorithms 
and self-organizing search, Discrete Appl. Math. 39 (1992), no. 3, 207-229. 

[9] P. Flajolet and R. Sedgewick, Analytic combinatorics, Cambridge University Press, 2009. 

[10] P. Flajolet, P. Zimmermann, and B. Van Cutsem, Calculus for the random generation of labelled 
combinatorial structures, Theoretical Computer Science 132 (1994), 1-35. 

[11] D. Gardy, Occupancy urn models in the analysis of algorithms, Journal of Statistical Planning and 
Inference 101 (2002), no. 1-2, 95 - 105. 

[12] Daniele Gardy and Yann Ponty, Weighted random generation of context-free languages: Analysis of 
collisions in random urn occupancy models, Proceedings of GASCom' 10, 2010. 

[13] J.S. McCaskill, The equilibrium partition function and base pair binding probabilities for RNA sec- 
ondary structure, Biopolymers 29 (1990), 1 105-1 1 19. 

[14] Peter Neal, The generalised coupon collector problem, Journal of Applied Probabilities 45 (2008), 
no. 3, 621-629. 

[15] R. Nussinov and A.B. Jacobson, Fast algorithm for predicting the secondary structure of single- 
stranded rna, Proc Natl Acad Sci U S A 77 (1980), 6903-13. 

[16] S.M. Ross, Introduction to probability models, 10th ed., Elsevier Science, 2009. 

[17] M. Vauchaussade de Chaumont and G. Viennot, Polyndmes orthogonaux et problemes 
d' enumeration en biologie moleculaire, Seminaire Lotharingien de Combinatoire (1983). 

[18] M. Vauchaussade de Chaumont and X.G Viennot, Enumeration ofRNA's secondary structures by 
complexity, Mathematics in Medecine and Biology (V. Capasso, E. Grosso, and S.L. Paven-Fontana, 
eds.), Lecture Notes in Biomathematics, vol. 57, 1985, pp. 360-365. 

[19] M. S. Waterman, Secondary structure of single stranded nucleic acids, Advances in Mathematics 
Supplementary Studies 1 (1978), no. 1, 167-212. 



The weighted words collector 

A Proof of Theorem 2.1 



19 



For the proof of the theorem, we need the following lemma. 

Lemma A.l Let E C N*. Let f and g be two non-zero positive functions on E, such that if E is not 
finite, lim = +00. Then, 

- 3 t*(f,g) > such that 



(1) V0<t<t* (/, g), 3x EE, /(so) - tg(x ) > 

(2) Vt>t*(f,g), VxeE, f(x)-tg(x)<0 

3 x\ G N* such that 



(3) f(xi) - g{xt)t = max(/(ar) - tg(x)) 

xGE 

Proof. 

Throughout the proof, f(x) — tg(x) is seen as a function of x with a parameter t. 
Let us define t x = 44, for all x G £. V t < t x , f(x) - tg(x) > and Vi > i x , /(x) - < 0. 

If £J is finite, it is obvious that t x reaches its maximum, i.e. there is X G E such that tx = maxftA 

x£E 

This property is still true when E is not finite because t x — > as a; — > 00. Then, (1) and (2) are satisfied, 
taking t*(/,g) = t x . 

If E is finite, it is obvious that f(x) — tg(x) reaches its maximum for all t > 0. If E is not finite, using 
the fact that lim 4fX = +00, we have Vt > 0, f(x) — tg(x) — > —00. Then /(a;) — £<?(a;) reaches its 

maximum, i.e. there is x\ G E such that f{x\) — (x\)t = max(/(a;) — g(x)t), which proves (3). □ 

xeE 

Proof of the theorem. 

Let us suppose that W m satisfies HI, H2, and H3. From equation (1.1), we have 



E[C m ] = 



|w m | 

n 



du. 



The substitution u- 



ui(m) 



Urn. E fli(m) 
3 = 1 



i gives 



oj{m) 



= 1 
P 



7=1 




M„ 



— e 



£7 E 9j ( m ) 



'|W m | 

exp I ^2 m «m log I 1 - e 



dt 



if* E 9j( m ) 



di. 



From HI, we have ^ gj{ m ) ~ To conclude, we have to show that the integral converges when 

m goes to infinity. First, we show that the integral from to t*(f\,c) converges to t*(f\,c). Then, we 
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show that the remaining integral converges to 0. 



• From Lemma A.l, applied to E, and H3 (if |W m | — > oo), there is i e E such that .fi(io) 
v{i )t > 0. Moreover, from H2, for m sufficiently large, W m ,i = z/(io)w(m). Then 



^ M m ,,log [ 1-e 

i=l 



|W m | "Vi, A , , 

< - M mtl e 

i=l 



3 = 1 



- * £ 9j( m ) -f(»o)t E »("») 

< — M P 3 = 1 — _ M ■ p 3 = 1 



From HI, for m sufficiently large, M m ^ > 



l e j= 



2 h(i ) 



-. Then, 



|w m | 



^ M OTii log 1 - e 



E (fj(io)-v(io)t)gj(m) 
^tyt X, 9j (m) \ _ e j = i 



< 



2/i(*o) 



As /i(i ) - f(i )i > and g^m) = o{g 1 {m)) for all j > 1, £ (/j(«o) - v{i )t) g^m) ->• +oo. 
Then, 



and 



|w m | 

^ M m)i log ( 1 - e 

i=l 



w p 
T^jt E 9i M 



— > — OO, 
m— »oo 



|W m | / n ... . ,^ 

n h- e 



M m , 4 



x^ry* E 93 ( r ' 



i=l 



-> 0. 

m— ^oo 



This leads to 



|w m | / . ^ 

i n (i 

»=i 



X^y* E 9j(m) 



(ft -> **(/!,!/). 



(A.l) 



By definition, W m ^/uj{m) is increasing in i, and from H2, for m sufficiently large, W m> i/a;(m) = ^(1). 
p 

Moreover, 9j( m ) ~ 9i(m) — > +oo, from HI. Then, for to sufficiently large, Vt > t*(fi,v), 

3 = 1 



-7T7777 



* E 9j ( m ) 



< i. Using log(l — a;) > — 2x for all x < 1/2, we have 



|w m | 



|W m | 



M m>i log 1-e ^ > - 2 2J M "^ e 



* E 9j(m) 

3 = 1 



1 = 1 



i=l 
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E tj (f)9j (m) 

From HI, we have that for all i, M m s < — — . From H2, for all i, W m .i > v{i)w{m). Thus, 



|W m | 



M m .i log 1 - e 



* E 9j( m ) 



i=l 



|W I E (/j( i )- I/ ( i )*)Sj( m ) 

»=i 



Vi > f), we have — i/(i)t < for all i < |W m |. From H3, there exists K > such that for 

all Kj<p, for all i < |W m | and for all t > t* (f u v), (fj{i) - v{i)t) < K. Then, Mi e £, 

p p 

3=1 J=2 

P 

For all j 7^ 1 we have = o(<?i). Thus, for m sufficiently large, (/?'(*) — 9j( m ) — — 

v{i)t)gx(m). Then, 



|w m | 

M TO;i log I 1 - e 



T'' * E sj( m ) 

3 = 1 



> 



|W m | 

-2E 



) 2(/ 1 (i)-y(i)t)g 1 (m) 



i=l 



|w m | 



2gi(m)max(/i(*)-i/(*)t) ^ 1 



|W m | 



From HI, there is C > such that }D j^jj < C. Moreover, we obviously have 



max(/i(i) — v(i)t) < max(/i(i) — ^(i)t) 



, which leads to 



Then, 



|w m | 

M m,i log I 1 - e 



^ o^-. 2gi(m)max(/i(i)-i/(i)t) 



i=l 



t*(/i,f) 



'|W m | 



1 - exp M m ,j log 1 - e 



W ■ P 

T^rf* E 9:(m) 



dt 



< 

< 2C 



2gi(m) m^(/l(i)-»(0') 

1 - e - 2CC 

2gi(m) max(/i(i)-v(i)() 



t*(/i.") 
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Choose t + > t*(fi,v), without any other assumption. As for all t > t*(fi,v), max(/i(i) — v{i)t) < 0, 

j I \ . , u 2gi(m)max(/i(i)-i/(i)t) 

and gi(m) — >• +00, we have e * eN — > 0. Then 



/"* 2gi(m)max(/i(»)-i/(»)t) 



Besides, for all t > t + , we have fi(i) — v(i)t < fi(i)-pp — v(i)t, hence 

max(/i(i) - v{i)t) < max(/i (i) - = max(/i(i) - 

From Lemma A.l and H3, this last maximum, denoted —7, is actually reached and we have —7 = 

max(/i(i) - v{i)t+) < 0. Then, 



e 23i(m)m^(/iW--Wt)^ ^ / e _2 7fll(TO) _^ 

7*+ 



e -275l(m) 

2jg 1 (m) 



■t + -> 



and finally, 



i=l 



/>oo 

• Equations (A.l) and (A.2) lead to 
And finally, using HI and equation (A. 3), 



|W m | / ,:, ~ 

i-n l-e 



M m ,i' 



7^7* £ 9j( m ) 



«it -> 0. 



|W m | / u-,.,, 

1 n 1 



M„ 



m— !-oo 



(A.2) 



(A.3) 



£[Cm] ~ t*(fi,v)(i m y^gj(m) ~ t*(fi,i>)fi m gi(m). 
3=1 



□ 



